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Abstract. Product probability property, known in the literature as statistical 
independence, is examined first. Then generalized entropies are introduced, all 
of which give generalizations to Shannon entropy. It is shown that the nature 
of the recursivity postulate automatically determines the logarithmic functional 
form for Shannon entropy. Due to the logarithmic nature, Shannon entropy 
naturally gives rise to additivity, when applied to situations having product 
probability property. It is argued that the natural process is non-additivity, 
important, for example, in statistical mechanics (Tsallis 2004, Cohen 2005), 
even in product probability property situations and additivity can hold due to 
the involvement of a recursivity postulate leading to a logarithmic function. 
Generalized entropies are introduced and some of their properties are exam- 
ined. Situations are examined where a generalized entropy of order a leads to 
pathway models, exponential and power law behavior and related differential 
equations. Connection of this entropy to Kerridge's measure of "inaccuracy" is 
also explored. 

1. Introduction 

Mathai and Rathie (1975) consider various generalizations of Shannon en- 
tropy (Shannon, 1948), called entropies of order a, and give various properties, 
including additivity property, and characterization theorems. Recently, Mathai 
and Haubold (2006, 2006a) explored a generalized entropy of order a, which 
is connected to a measure of uncertainty in a probability scheme, Kerridge's 
(Kerridge, 1961) concept of inaccuracy in a scheme, and pathway models that 
are considered in this paper. 

As defined in Mathai and Haubold (2006, 2006a) the entropy Mk, a (P) is a 
non-additive entropy and his measure a (P) is an additive entropy. It is also 
shown that maximization of the continuous analogue of Mk, a (P), denoted by 
Ma(f), gives rise to various functional forms for /, depending upon the types 
of constraints on /. 
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Occasionally, emphasis is placed on the fact that Shannon entropy satisfies 
the additivity property, leading to extensivity. It will be shown that when 
the product probability property (PPP) holds then a logarithmic function can 
give a sum and a logarithmic function enters into Shannon entropy due to the 
assumption introduced through a certain type of recursivity postulate. The 
concept of statistical independence will be examined in Section 1 to illustrate 
that simply because of PPP one need not expect additivity to hold or that 
one should not expect this PPP should lead to extensivity. The types of non- 
extensivity, associated with a number of generalized entropies, are pointed out 
even when PPP holds. The nature of non-extensivity that can be expected 
from a multivariate distribution, when PPP holds or when there is statistical 
independence of the random variables, is illustrated by taking a trivariate case. 

Maximum entropy principle is examined in Section 2. It is shown that 
optimization of measures of entropies, in the continuous populations, under 
selected constraints, leads to various types of models. It is shown that the 
generalized entropy of order a is a convenient one to obtain various probability 
models. 

Section 3 examines the types of differential equations satisfied by the various 
special cases of the pathway model. 

1.1. Product probability property (PPP) or statistical independence 
of events 

Let P(A) denote the probability of the event A. If the definition P(AP\B) = 
P(A)P(B) is taken as the definition of independence of the events A and B then 
any event A e S, and S the sure event are independent. But A is contained in S 
and then the definition of independence becomes inconsistent with the common 
man's vision of independence. Even if the trivial cases of the sure event 5* and 
the impossible event (j) are deleted, still this definition becomes a resultant of 
some properties of positive numbers. Consider a sample space of n distinct 
elementary events. If symmetry in the outcomes is assumed then we will assign 
equal probabilities ^ each to the elementary events. Let C = A n B. If A and 
B are independent then P(C) = P(A)P{B). Let 

P(A) = -, P(B) = V -, P(C) = -. 
n n n 

Then 

in) in) = in) ^ nz = xy ' x >y> z = 1 ' 2 ' l,z<x,y (1) 

deleting S and <fi. There is no solution for x,y, z for a large number of n, for 
example, n = 3,5, 7. This means that there are no independent events in such 
cases and it sounds strange from a common man's point of view. 

The term "independence" of events is a misnomer. This property should 
have been called product probability property or PPP of events. There is no 
reason to expect the information or entropy in a joint distribution to be the sum 
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of the information contents of the marginal distributions when the PPP holds 
for the distributions, that is when the joint density or probability function is 
a product of the marginal densities or probability functions. We may expect a 
term due to the product probability to enter into the expression for the entropy 
in the joint distribution in such cases. But if the information or entropy is 
defined in terms of a logarithm, then naturally, logarithm of a product being 
the sum of logarithms, we can expect a sum coming in such situations. This is 
not due to independence or due to the PPP of the densities but due to the fact 
that a functional involving logarithm is taken thereby a product has become 
a sum. Hence not too much importance should be put on whether or not the 
entropy on the joint distribution becomes sum of the entropies on marginal 
distributions or additivity property when PPP holds. 

1.2. How is logarithm coming in Shannon's entropy? 

Several characterization theorems for Shannon entropy and its various gen- 
eralizations are given in Mathai and Rathie (1975. Modified and refined versions 
of Shannon's own postulates are given as postulates for the first theorem charac- 
terizing Shannon entropy in Mathai and Rathie (1975). Apart from continuity, 
symmetry, zero-indifference and normalization postulates the main postulate 
in the theorem is a recursivity postulate, which in essence says that when the 
PPP holds then the entropy will be a weighted sum of the entropies, thus in 
effect, assuming a logarithmic functional form. The crucial postulate is stated 
here. Consider a multinomial population P = (pi, ...,p m ), pi > 0, i = 1, ...,to, 
pi + ... + p m = 1, that is, p t = P{Ai), i = l,...,m, Ai U ... U A m = S, 
Ai n Aj = <f>, i =/= j. If any pi can take a zero value also then zero-indifferent 
postulate, namely that the entropy remains the same when an impossible event 
is incorporated into the scheme, is to be added. Let H n (pi, ...,p n ) denote the 
entropy to be defined. Then the crucial recursivity postulate says that 

H n (pi, ■■■,p m -l,Pm,qi, ■■,P m qn-m+l) 

= H m (pi, ...,p m ) +p m H„- m +i(qi, Qn— m+1 

) (2) 

Y^i=\Vi — 1j S"=i™ +1 & = 1- This says that if the m-th event A m is par- 
titioned into independent events P(A m n Bj) = P(A m )P(Bj) = p m qj, j = 
1, ...,n - to + 1 so that p m = p m q x + ... + p m q n - m+ i then the entropy H n { : ) 
becomes a weighted sum. Naturally, the result will be a logarithmic function 
for the measure of entropy. 

There are several modifications to this crucial recursivity postulate. One 
suggested by Tverberg is that n — to + 1 = 2 and q\ = q, qi = 1 — q, < q < 1 
and H 2 (q, 1 — q) is assumed to be Lebesgue integrable in < q < 1 . Again 
a characterization of Shannon entropy is obtained. In all the characterization 
theorems for Shannon entropy this recursivity property enters in one form or the 
other as a postulate, which in effect implies a logarithmic form for the entropy 
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H k , a (P) = - , a ± 1, a > (5) 



measure. Shannon entropy Sfc has the following form: 

fc 

S fe = -A^Klnpj, p 4 > 0, 2 = 1, + ...+Pk = 1, (3) 

where A is a constant. If any pj is assumed to be zero then OlnO is to be 
interpreted as zero. Since the constant A is present, logarithm can be taken to 
any base. Usually the logarithm is taken to the base 2 for ready application to 
binary systems. We will take logarithm to the base e. 

1.3. Generalization of Shannon entropy 

Consider again a multinomial population P = (pi,...,Pk), Pi > 0, i = 
1, k,pi + ... + pk = 1. The following are some of the generalizations of 
Shannon entropy Sk- 

RkAP) = H ^= lP? \ a ? 1, a > 0, (4) 
1 — a 

(Renyi entropy of order a of 1961) 

2 l-a _ 1 

(Havrda-Charvat entropy of order a of 1967) 

T k , a (P) = ^ < = lP? ~ 1 , a^l, a>0 (6) 
1 — a 

(Tsallis entropy of 1988) 

M fc , Q (P) = Ei=iPi "-^ a ^ 1; _ 00<a<2 (7) 
a — 1 

(entropic form of order a) 

M * (p) = MEtiP- a ) ? a ^ lt _ 0O<Q<2; (8) 

a — 1 

(additive entropic form of order a) . 

When a — > 1 all the entropies of order a described above in (4) to (7) go to 
Shannon entropy S k . 

lim Rk, a ( p ) = lim H k , a {P) = lim T k . a (P) = lim M fe , Q (P) = lim M£ a (P) = S k . 

a— »1 a — >l a— »1 a— »1 a — >1 

(9) 

Hence all the above measures are called generalized entropies of order a. 

Let us examine to see what happens to the above entropies in the case of a 
joint distribution. Let pij > 0, i — l,...,m,j = 1, n such that El=i Ej=i Pij = 
1. This is a bivariate situation of a discrete distribution. Then the entropy in 
the joint distribution, for example, 

M m , n , a (P, Q) = . (10) 

a — 1 
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If the PPP holds and if p i3 = piqj, pi + ... + p m = 1, q\ + ... + q n = 1, 
p t > 0, i = l, ...,m, qj > 0, j = 1, ...,n and if P = (pi, ...,p m ), Q = (gi, ...,g n ) 
then 

(a-l)M m , a (P) M n , Q (Q)~^- a -lj ^---1 
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EErf- a «?- a -Erf- a -E«?- a + 1 

-i— 1 _7= 1 z—l j — 1 



a- 1 

= M ro , n , a (P, Q) - M ro , a (P) - M„, a (Q). 

Therefore 

M m>n , a (P, Q) - M m , a (P) + M„, a (Q) + (a - l)M m , a (P)M n , a (Q). (11) 

If any one of the above mentioned generalized entropies in (4) to (8) is written 
as P m ,Ti, Q (P, Q) then we have the relation 

F m , n , a (P,Q) = P m ,„(P) + F 1ha (Q) +a(a)F m}CC (P)F n , a (Q). (12) 

where 

a(a) = (Rcnyi entropy Rk,a{P)) 

_ 2 1 ~ a — 1 (Havrda-Charvat entropy H^ a (P)) 

= I- a (Tsallis entropy T k , a (P)) 

= a — 1 (cntropic form of order a, i.e., Mfc ;Cl (P)) 

= (additive entropic form of order a, i.e., M^ a {P)). (13) 

When a(a) = the entropy is called additive and when a(a) ^ the entropy 
is called non-additive. As can be expected, when a logarithmic function is 
involved, as in the cases of Sk(P), Rk,a(P), M£ a (P), the entropy is additive 
and a(a) = 0. 

1.4. Extensions to higher dimensional joint distributions 

Consider a trivariate population or a trivariate discrete distribution pijk > 
0, i = l,...,m, j = l,...,n, k = l,...,r such that YT=i Z)"=i Yl=i Pm = 1- If 
the PPP holds mutually, that is, pair-wise as well as jointly, which then will 
imply that 

m n r 

Pijk = Piq s kl y^Pi = l, y^qj = l, E Sfe = 1 ' 

»=i i=i fe=i 

P = (pi,...,p m ), Q = (gi,...,gn), S= (si,...,S r ). 

Then proceeding as before, we have for any of the measures described above in 
(4) to (8), calling it P(-), 

F m , n , r , a ( p >Q>S) = F m , a (P) + F n , a (Q) + F r , a (S) + a(a)[F„ ha (P)F n , a (Q) 
+F m>a {P)F r , a (S) + F n , a (Q)F r>a (S)} 
+ [a{a)] 2 F m ^{P)F n ^{Q)F^ a {S) (14) 
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where a(a) is the same as in (13). The same procedure can be extended to any 
multivariable situation. If a(a) = we may call the entropy additive and if 
a(a) 7^ then the entropy is non-additive. 

1.5. Crucial recursivity postulate 

Consider the multinomial population P = (pi,...,Pk), Pi > , i = 1 , . . . , k, p\ + 
... + Pk = 1- Let the entropy measure to be determined through appropriate 
postulates be denoted by Hf-{P) = Hk(pi, ...,Pk)- For k = 2 let 

f(x) = H 2 (x, l-x), < x < 1 or x e [0, 1]. (15) 

If another parameter a is to be involved in H 2 (x, 1 — x) then we will denote f(x) 
by f a (x). From (5) to (7) it can be seen that the generalized entropies of order 
a of Havrda-Charvat (1967), Tsallis (1988, 2004) and Shannon (1948) entropy 
satisfy the functional equation 

f a (x) + b a (x)f a (jz^j = Uv) + b a (x)f (jzq;) ^ 

for i,j6 [0, ) with x + y e [0,1], with the boundary condition 

/a(0) - /a(l) (17) 

where 



b a (x) = 1 — x (Shannon entropy Sk(P)) 

= (1 — x) a (Harvda-Charvat entropy i?fc jQ (P)) 

= (1 - x) a (Tsallis entropy T k ^ a (P)) 

= (1 — x) 2 ~ a (entropic form of order a, i.e., M k ^ a {P)). 



(18) 



Observe that the normalizing constant at x = | is equal to 1 for Hk, a (P) and it 
is different for other entropies. Thus equations (6), (7), (8), with the appropriate 
normalizing constants / a (|), can give characterization theorems for the various 
entropy measures. The form of b a (x) is coming from the crucial recursivity 
postulate, assumed as a desirable property for the measures. 

1.6. Continuous analogues 

In the continuous case let f(x) be the density function of a real random 
variable x. Then the various entropy measures, corresponding to the ones in (4) 
to (8) are the following: 



RM) 



In 



- 1 



/OO 
[/(*)] 
-oo 

of O 

[f(x)] a dx - 1 



1 - a 

(Renyi entropy of order a) 
1 



, a ^ 1, a > 



, a ^ 1, a > 



(19) 



(20) 
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M a (f) 
M*(f) 



(Havrda-Charvat entropy of order a) 
1 

1 - a 

(Tsallis entropy of order a) 



[f(x)] a dx - 1 



, a^l, a > 0, 



1 



, [f(x)f- a dx-l 

a — 1 

(entropic form of order a) 



In 



[/(a 

-oo 



*dx 



a - 1 

(additive entropic form of order a 
As expected, Shannon entropy in this case is given by 



a^l, a < 2 



a^l, a < 2 



(21) 
(22) 
(23) 



/OO 
ln/(x)dx 
-OO 



(24) 



where A is a constant. 



Note that when PPP (product probability property) or statistical indepen- 
dence holds then in the continuous case also we have the property in (12) and 
(14) and then non-additivity holds for the measures analogous to the ones in 
(3), (5), (6), (7) with a(a) remaining the same. Since the steps are parallel a 
separate derivation is not given here. 

2. Maximum Entropy Principle 

If we have a multinomial population P = (pi, . ..,£>&), Pi > 0,i = 1, k,p\ + 
... +p k = 1 or the scheme P(Ai) = p u A 1 U ... UA k = S, P(S) = 1, A. t n Aj = 
4>, i ^ j then we know that the maximum uncertainty in the scheme or the 
minimum information from the scheme is obtained when we cannot give any 
preference to the occurrence of any particular event or when the events are 
equally likely or when p\ = pi = ... = p k = \- In this case, Shannon entropy 
becomes, 



S k (P) 



^ k k 



Alnk 



(25) 



and this is the maximum uncertainty or maximum Shannon entropy in this 
scheme. If the arbitrary functional / is to be fixed by maximizing the entropy 
then in (19) to (21) we have to optimize J^° oo [f(x)] a dx for fixed a, over all 
functional /, subject to the condition f(x)dx = 1 and f(x) > for all x. 
For applying calculus of variation procedure we consider the functional 

U = [f(x)F A[/(z)] 
where A is a Lagrangian multiplier. Then the Euler equation is the following: 



dU 



=> af"- 1 - A = => / 



= constant. 



(26) 
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Hence / is the uniform density in this case, analogous to the equally likely 
situation in the multinomial case. If the first moment E(x) = xf(x)dx 
is assumed to be a given quantity for all functional / then U will become the 
following for (19) to (21). 

U=[f(x)] a -X 1 [f(x)]-X 2 xf(x) 
and the Euler equation leads to the power law. That is, 

i 

(27) 

By selecting ci,Ai,A 2 appropriately we can create a density out of (27). For 
a > 1 and ^ > the right side in (27) increases exponentially. If a = q > 1 and 
j^- = q — 1 then we have Tsallis' g-exponential function from the right side of 
(27). If a > 1 and ^ = —(a — 1) then (27) can produce a density in the category 
of a type-1 beta. From (27) it is seen that the form of the entropies of Havrda- 
Charvat Hk. a (P) and Tsallis Tk. a (P) need special attention to produce densities 
(Ferri et al. 2005). However, Tsallis has considered a different constraint on 
E(x). If the density f(x) is replaced by its escort density, namely, fi[f(x)] a 
where yT 1 = J x [f(x)] a dx and if the expected value of x in this escort density 
is assumed to be fixed for all functional / then the U of (26) becomes 

= f a -Mf + »Mxf a 
=> af a - x [l + n\ 2 x] = Ai => / = £—r- => 

(l + \ 3 x) — 

= AI[l + A 3 a;]-^T 

where A 3 is a constant and A J is the normalizing constant. If A 3 is taken as 
A 3 = a — 1 then 

f = \l[l + (a-l)x]-^. (28) 

Then (28) for a > 1 is Tsallis statistics (Tsallis 2004, Cohen 2005). Then for 
a < 1 also by writing a — 1 = — (1 — a) one gets the case of Tsallis statistics 
for a < 1 (Ferri et al. 2005). These modifications and the consideration of 
escort distribution are not necessary if we take the generalized entropy of order 
a. Thus if we consider M a (f) and if we assume that the first moment in f(x) 
itself is fixed for all functional / then the Euler equation gives 

(2 - a)f- a - Ai + X 2 x = =► / = A 
and for |a = 1 - a we have Tsallis statistics (Tsallis 2004, Cohen 2005) 

/ = A[l-(l-a)a;]^ (29) 
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dU 



ITT = af 



a-l 



Ai — X 2 x = => f = c\ 



1 + —x 
Xl _ 



u 

and 
dU_ 

df 
f 



1 A2 

'"AT* 



coming directly, where A is the normalizing constant. 

Let us start with M a (f) of (20) under the assumptions that f(x) > for all 
x, J a f(x)dx = 1, f x s f(x)dx is fixed for all functional / and for a specified 
8 > 0, /(a) is the same for all functional /, /(£>) is the same for all functional 
/, for some limits a and b, then the Euler equation becomes 

(2 - a)/ 1 "" - Ax - X 2 x 5 = =► / = ci [1 + c*:/] . (30) 

If is written as — s(l — a), s > then we have, writing fi for /, 

h = Cl [l - s(l - a)x 5 }^ , 5>0,a<l, 0<x< 1 (31) 

[*(l-a)]* 

where 1 — s(l — a)a;* > 0. For a < 1 or —00 < a < 1 the right side of (31) 
remains as a generalized type-1 beta model with the corresponding normalizing 
constant c\. For a > 1, writing 1 — a = —(a — 1) the model in (31) goes to a 
generalized type-2 beta form, namely, 

f2 = c 2 [l + s(a-l)x 5 }-^. (32) 

When a — > 1 in (31) or in (32) we have an extended or stretched exponential 
form, 

f3 = c 3 e~ sx \ (33) 

If c\ in (30) is taken as positive then (30) for a < 1, a > 1, a — > 1 will be 
increasing exponentially. Hence all possible forms are available from (30). The 
model in (31) is a special case of the distributional pathway model and for a 
discussion of the matrix- variatc pathway model see Mathai (2005). Special cases 
of (31) and (32) for S = 1 are Tsallis statistics (Gcll-Mann and Tsallis, 2004; 
Fcrri ct al. 2005). 



Instead of optimizing M a (f) of (22) under the conditions that f(x) > 
for all x, f(x)dx = 1 and x s f(x)dx is fixed, let us optimize under the 

following conditions: f(x) > for all x, J f(x)dx < 00 and the following two 
moment-like expressions are fixed quantities for all functional /, 



/ a; (7-i)(i-a)/( a ;)da; = fixed , / x h-W-<*)+s f( x )& x = nxe d. 

J a J a 

Then the Euler equation becomes 

(2 - a)/ 1 "" -\ lX h-W-«) - A 2 z (7 - 1)(1 - a)+6 = => 

/ = ci^ll + c'i']^ 

and for c* = — s(l — a), s > 0, we have the distributional pathway model for 
the real scalar case, namely 

f(x) = c x^-^l - s(l - a)x 5 }^, 5 > 0, s > (34) 
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where c is the normalizing constant. For a < 1, (34) gives a generalized type-1 
beta form, for a > 1 it gives a generalized type-2 beta form and for a — > 1 
we have a generalized gamma form. For a > 1, (34) gives the superstatistics 
of Beck (2006) and Beck and Cohen (2003). For 7 = 1,6 = 1, (34) gives 
Tsallis statistics (Tsallis 2004, Cohen 2005). Densities appearing in a number 
of physical problems are seen to be special cases of (34), a discussion of which 
may be seen from Mathai and Haubold (2006a). For example, (34) for S = 
2, 7 = 3, a — ► 1, x > is the Maxwell-Boltzmann density; for 6 = 2, 7 = 1, a — ► 
1,— oo<a:<oois the Gaussian density; for 7 = 6, a — > 1 is the Weibull density. 
For 7 = 1, (5 = 2, 1 < g < 3 we have the Wigner function W(p) giving the atomic 
moment distribution in the framework of Fokker-Planck equation, see Douglas, 
Bergamini, and Renzoni (2006) where 

W(p) = z-^l - 0(1 - q)p 2 }^, K q < 3. (35) 

Before closing this section we may observe one more property for M a (f). As 
an expected value 

M a {f) = -^-[E[f(x)] 1 - a -l\. (36) 

But Kerridge's (Kerridge, 1961) measure of "inaccuracy" in assigning q(x) for 
the true density f(x), in the generalized form is 

H a (.f : q) = (2 i-i_ 1} [EM*)]"- 1 l] . (37) 

which is also connected to the measure of directed divergence between q(x) and 
f(x). In (37) the normalizing constant is 2 1_Q — 1, the same factor appearing in 
Havrda-Charvt entropy. With different normalizing constants, as seen before, 
(36) and (37) have the same forms as an expected value with q(x) replaced 
by f(x) in (36). Hence M a (f) can also be looked upon as a type of directed 
divergence or "inaccuracy" measure. 

3. Differential Equations 

The functional part in (34), for a more general exponent, namely 

f(x) 
g {x) = ^— = x' 1 - 1 ^ - s(l - a)x 5 } — , a ^ 1,6 > 0,/3 > 0, s > (38) 

is seen to satisfy the following differential equation for 7 ^ 1 which defines the 
differential pathway. 

x^-g(x) = (7 - ljx-t-^l - s(l - a)x 5 ]^ 

-sp6x s+ ^- 1 [l - s(l - aja:']^! 1 -^]. (39) 
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Then for S = (7 ~ 1) j"~ 1) , 7 ^ 1, a > 1 we have 




- (7 - l)g{x) -s(38[g{x)? 

= (7 - l)fl(z) - s5[g(x)] a 
for /3= 1, 7 ^M = (7 



1- 



(1 



(40) 
(41) 



l)(a - 1), a > 1. 



For 7 = 1, 5 = 1 in (38) we have 



^g{x) = -s[g(x)y\ n = i-- 

= -s[g(x)} a for (3 = 1. 



(!-") 



(42) 



(43) 



Here (43) is the power law coming from Tsallis statistics (Gell-Mann and Tsallis, 
2004). 
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